---
title: Set-of-Mark
description: Reference for the current version of the Set-of-Mark library.
pypi: cua-som
github:
  - https://github.com/trycua/cua/tree/main/libs/python/som
---

<Callout>
  A corresponding{' '}
  <a href="https://github.com/trycua/cua/blob/main/examples/som_examples.py" target="_blank">
    Python example
  </a>{' '}
  is available for this documentation.
</Callout>

## Overview

The SOM library provides visual element detection and interaction capabilities. It is based on the [Set-of-Mark](https://arxiv.org/abs/2310.11441) research paper and the [OmniParser](https://github.com/microsoft/OmniParser) model.

## API Documentation

### OmniParser Class

```python
class OmniParser:
    def __init__(self, device: str = "auto"):
        """Initialize the parser with automatic device detection"""

    def parse(
        self,
        image: PIL.Image,
        box_threshold: float = 0.3,
        iou_threshold: float = 0.1,
        use_ocr: bool = True,
        ocr_engine: str = "easyocr"
    ) -> ParseResult:
        """Parse UI elements from an image"""
```

### ParseResult Object

```python
@dataclass
class ParseResult:
    elements: List[UIElement]      # Detected elements
    visualized_image: PIL.Image    # Annotated image
    processing_time: float         # Time in seconds

    def to_dict(self) -> dict:
        """Convert to JSON-serializable dictionary"""

    def filter_by_type(self, elem_type: str) -> List[UIElement]:
        """Filter elements by type ('icon' or 'text')"""
```

### UIElement

```python
class UIElement(BaseModel):
    id: Optional[int] = Field(None)     # Element ID (1-indexed)
    type: Literal["icon", "text"]       # Element type
    bbox: BoundingBox                   # Bounding box coordinates { x1, y1, x2, y2 }
    interactivity: bool = Field(default=False)   # Whether the element is interactive
    confidence: float = Field(default=1.0)       # Detection confidence
```
